Ada Refresher

Laura Dean

Learning Objectives – to refresh yourselves on

  • What resources are available to you on the HPC
  • How to log in to the HPC
  • How to write a job submission script
  • How to submit and manage a job using basic slurm commands
  • How to access software on the HPC
  • How to copy files between the HPC and your local machine
  • Where to go for further support

Reminder of your ‘Exploration tier’ user account limits

  • simultaneous use of up to 96 CPU cores

  • 360 GB memory (RAM)

  • Job lengths of up to 2 days

  • 1TB of storage in your home directory

Partitions on Ada

Logging in

  1. Connect to the VPN

  1. Open your terminal and type ssh ada

Writing slurm scripts

  • Remember, your script will be submitted from Ada, so it needs to exist in your user account on Ada, not on your own laptop.

  • You can write the script from your terminal with a text editor such as nano or vim.

  • If you prefer to use a GUI text editor such as Sublime (Mac) or Notepad++ (Windows), then write the script in the editor on your laptop and copy and paste it into nano or vim at the command line to save it on Ada.

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

The shebang - tells the operating system which interpreter to use when executing the script

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–partition=

Select the partition i.e. the type of compute node you want to use

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–nodes=

Select how many compute nodes you want to use (almost always this will be 1)

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–ntasks=

Select how many separate tasks you want your job to be split across (for most jobs this will also be 1)

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–cpus-per-task=

Select how many cores you want your job to use. This is where you can make a lot of things much faster by requesting more cores.

NOTE: you cannot request more than the maximum number of cores in the node or nodes you have requested.

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–mem=

Select how much RAM to allocate to your job. This allows you to use far more memory than you would usually be able to access on a standard computer.

Units:

g = gb – gigabytes

m = mb – megabytes

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–time=

Select the maximum amount of time your job can run for.

Units:

hh:mm:ss

NOTE: the different nodes have different maximum job times. No one can request more than 7 days (168 hours) on any node.

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–job-name=

Short useful name for you to identify your job (no spaces allowed!)

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–output=

Name and location of output file (all output that would be printed to the screen if you ran your script directly in the terminal will be stored here)

If you do not specify a separate error file, errors will also be written to your output file.

%x = Job name (taken from –job-name=)

%j = Job ID (allocated upon submission)

Note: you cannot use ~ as shorthand for your home directory here

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–error=

OPTIONAL: Name and location of error file (any errors that would be printed to the screen if you ran your script directly in the terminal will be stored here)

If you do not specify a separate error file, errors will be written to your output file.

%x = Job name (taken from –job-name=)

%j = Job ID (allocated upon submission)

Note: you cannot use ~ as shorthand for your home directory here

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

–mail-type=ALL

–mail-user=

OPTIONAL: include both these lines and replace my username with yours if you want the system to send you an email when your job starts running and when it completes.

Components of a slurm script

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=5g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# Code for whatever you want your script to do HERE

The main body of your script – the bit where you write what you want the script to do.

PRACTICE: write a script on Ada

Use nano or vim to create script.sh on Ada:

#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:10:00
#SBATCH --job-name=my_job
#SBATCH --output=/gpfs01/home/YOUR_USERNAME/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/YOUR_USERNAME/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=YOUR_USERNAME@exmail.nottingham.ac.uk

# We are going to practice doing a small task
echo "My username on Ada is $USER" > ~/my_script_worked.txt

# if the script ran successfully you will now have a new file
# in your home directory

NOTE replace YOUR_USERNAME with your university username

SLURM commands

  • submit your job to the slurm scheduler:

    sbatch script.sh
  • check your pending or running jobs in the queue: (replace YOUR_USERNAME)

    squeue –l –u YOUR_USERNAME

  • Cancel your job using the command: (replace job_id_number)

    scancel job_id_number

    (NOTE: the jobid number is from the output from squeue)

Software modules

  • To list all available software modules:

    module avail

  • S = sticky (don’t unload), L = loaded, D = default (use this version)

Software modules

  • To search for software:

    module avail vcftools

  • To load software modules:

    module load vcftools-uoneasy/0.1.16-GCC-12.3.0
  • To unload software:

    module unload vcftools-uoneasy/0.1.16-GCC-12.3.0

Using software in a slurm script

#!/bin/bash

#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:05:00
#SBATCH --job-name=module_test
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# load a module
module load samtools-uoneasy/1.18-GCC-12.3.0

# do what you wanted to do with the software
# e.g. maybe index a file
samtools index ~/myfile.bam

# unload the module
module unload samtools-uoneasy/1.18-GCC-12.3.0

Requesting new software modules

search HPC in the service catelogue and submit a request

https://uniofnottm.saasiteu.com/Modules/SelfService/#serviceCatalog

Software with conda

  • create a conda environment with python:

    conda create --name my_env python=3.10
  • activate your conda environment:

    conda activate my_env
  • install another software in an active conda environment

    conda install nanoplot
  • deactivate your conda environment

    conda deactivate

Software with conda

  • to list all of your conda environments:

    conda env list
  • to list all software in an active conda environment:

    conda activate my_env
    conda list
  • to remove a conda environment (this cannot be undone):

    conda remove --name my_env --all

Using conda in a slurm script

You must source your bash profile to initialise conda then use it as you would outside a slurm script

#!/bin/bash

#SBATCH --partition=defq
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:05:00
#SBATCH --job-name=module_test
#SBATCH --output=/gpfs01/home/mbzlld/slurm-%x-%j.out
#SBATCH --error=/gpfs01/home/mbzlld/slurm-%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=mbzlld@exmail.nottingham.ac.uk

# source your bash profile so you can use conda
source $HOME/.bash_profile

# activate conda environment
conda activate environment_name

# do some tasks within conda environment

# deactivate conda environment
conda deactivate

Moving data - files

  • scp - secure copy

  • scp must always be run from your local machine not the remote machine i.e. you have to run scp from your own laptop NOT logged in to Ada

  • scp syntax: scp source destination

  • To copy a file from Ada to your current working directory

    scp ada:~/path/to/file/file.txt ./
  • To copy a file from your current working directory to Ada

    scp file.txt ada:~/path/to/dir/

Moving data - directories

  • To copy whole directories we use the same syntax, but include the -r flag to copy the directory recursively

  • To copy a directory from Ada to your current working directory

    scp -r ada:~/path/to/directory ./
  • To copy a file from your current working directory to Ada

    scp -r directory ada:~/path/to/destination/

Where to go if you need help?